Mining Host-Pathogen Interactions
نویسندگان
چکیده
containing HPI information and another abstract that does not. Using a training set of pre-annotated abstracts, the system can then learn how to efficiently discriminate between these two abstract types. Moreover, the same characteristic features can be calculated for the individual sentences in the abstract. Thus, we can use the same supervised-learning approach to solve Tasks 1 and 2. Finally, to solve Task 3 one can use a simple dictionary-based search for each sentence classified as containing HPI information. Our feature-based approach consists of four basic stages (Fig. 1C). First, each abstract is preprocessed to find each protein/gene in the abstract and identify its organism name. Second, for each abstract a feature vector is generated. Third, our supervised learning system is trained by providing the feature vectors generated from the positive and negative sets. Finally, the trained system is used on an independent testing set of HPI and non-HPI abstracts to assess the approach.s to assess the approach. Text preprocessing. We first add the publication title to the abstract as its first sentence. The abstract is then further split into individual sentences by detecting the sentence terminationis then further split into individual sentences by detecting the sentence termination patterns. A basic pattern of a period (.), followed by a space and capitalized letter can be directly used to distinguish sentences in a standard text. However, there are known challenges when preprocessing a biomedical (or any scientific) publication. For instance, the above simple approach is not always applicable, since the periods are often used in the names of proteins, abbreviations such as “i.e.”, “e.g.”, “vs.”, and others. We first identify such cases using a predefined dictionary, replace periods in these words by spaces, and then apply the above basic pattern. The next steps of the preprocessing stage concerns with detecting the organism and protein/gene names using the entity tagging software NLProt (Mika and Rost 2004). Support vector machines in text categorization. The problem of detecting whether an abstract contains HPI information can be formulated as a problem of supervised text categorization, with the goal of classifying abstracts into one of the selected categories. In our case, two categories can be naturally defined: (i) abstracts containing HPI information and (ii) abstracts without HPI information. Formally, given a training set of n objects, each represented as a vector of N numerical features, xi = (x1, x2, ..., xN), and their classification into one of the two classes y{-1,1}, the goal is to train a feature-based classifier based on the training set. After the training stage is completed, the classifier can assign a class label from y for any new abstract x. In our approach, we use support vector machines (SVM) (Vapnik 1998), a supervised learning method, which is well established in bioinformatics and has been recently applied to identify abstracts containing host-bacteria interaction
منابع مشابه
Literature Mining and Ontology based Analysis of Host-Brucella Gene–Gene Interaction Network
Brucella is an intracellular bacterium that causes chronic brucellosis in humans and various mammals. The identification of host-Brucella interaction is crucial to understand host immunity against Brucella infection and Brucella pathogenesis against host immune responses. Most of the information about the inter-species interactions between host and Brucella genes is only available in the text o...
متن کاملA review on computational systems biology of pathogen–host interactions
Pathogens manipulate the cellular mechanisms of host organisms via pathogen-host interactions (PHIs) in order to take advantage of the capabilities of host cells, leading to infections. The crucial role of these interspecies molecular interactions in initiating and sustaining infections necessitates a thorough understanding of the corresponding mechanisms. Unlike the traditional approach of con...
متن کاملEditorial: Computational Systems Biology of Pathogen-Host Interactions
Pathogen-Host Interactions (PHIs) play a significant role in the mechanisms of infections. Therefore, the investigation of infectionmechanisms through PHIs is a crucial step to develop novel and more effective solutions against drug-resistance and for personalized therapy. To this aim, systems biology approach considers the whole PHI system instead of focusing hosts or pathogens individually. C...
متن کاملMining Host-Pathogen Protein Interactions to Characterize Burkholderia mallei Infectivity Mechanisms
Burkholderia pathogenicity relies on protein virulence factors to control and promote bacterial internalization, survival, and replication within eukaryotic host cells. We recently used yeast two-hybrid (Y2H) screening to identify a small set of novel Burkholderia proteins that were shown to attenuate disease progression in an aerosol infection animal model using the virulent Burkholderia malle...
متن کاملGeMInA, Genomic Metadata for Infectious Agents, a geospatial surveillance pathogen database
The Gemina system (http://gemina.igs.umaryland.edu) identifies, standardizes and integrates the outbreak metadata for the breadth of NIAID category A-C viral and bacterial pathogens, thereby providing an investigative and surveillance tool describing the Who [Host], What [Disease, Symptom], When [Date], Where [Location] and How [Pathogen, Environmental Source, Reservoir, Transmission Method] fo...
متن کاملProgress in Computational Studies of Host-pathogen Interactions
Host-pathogen interactions are important for understanding infection mechanism and developing better treatment and prevention of infectious diseases. Many computational studies on host-pathogen interactions have been published. Here, we review recent progress and results in this field and provide a systematic summary, comparison and discussion of computational studies on host-pathogen interacti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012